Auto-encoders: reconstruction versus compression

نویسنده

  • Yann Ollivier
چکیده

We discuss the similarities and differences between training an autoencoder to minimize the reconstruction error, and training the same auto-encoder to compress the data via a generative model. Given a dataset, auto-encoders (for instance, [PH87, Section 8.1] or [HS06]) aim at building a hopefully simpler representation of the data via a hidden, usually lower-dimensional feature space. This is done by looking for a pair of maps X f → Y g → X from data space X to feature space Y and back, such that the reconstruction error between x and g(f(x)) is small. Identifying relevant features hopefully makes the data more understandable, more compact, or simpler to describe. Here we take this interpretation literally, by considering auto-encoders in the framework of minimum description length, i.e., data compression via a probabilistic generative model, using the general correspondence between compression and “simple” probability distributions on the data (for instance, [Grü07]). The objective is then to minimize the codelength (log-likelihood) of the data using the features found by the auto-encoder1. This note answers two questions. First, do auto-encoders trained to minimize reconstruction error actually minimize the length of a compressed encoding of the data, or an approximation thereof? Second, if the goal is to compress the data, why should the auto-encoder approach help compared to directly looking for a generative model? We will see that by adding an information-theoretic term to the reconstruction error, auto-encoders can be trained to minimize a tight upper bound on the codelength (compressed size) of the data. In most situations, directly working with the codelength is difficult (Section 2) hence the interest of such bounds. In Section 3 we introduce a first, simple bound on codelength based on reconstruction error; it is not tight and is valid only for discrete features. Still it already illustrates how minimizing codelength favors using fewer features. The goal is not to explicitly build an encoding of the data [Grü07]; rather, it is to find a good pair of feature and generative functions that would yield a short codelength. If the codelength is known as a function of the parameters of the auto-encoder, it can be used as the training criterion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Pitfall of Unsupervised Pre-Training

In this paper we thoroughly investigate the quality of features produced by deep neural network architectures obtained by stacking and convolving Auto-Encoders. In particular, we are interested into the relation of their reconstruction score with their performance on document layout analysis. When using Auto-Encoders, intuitively one could assume that features which are good for reconstruction ...

متن کامل

Conditional Probability Models for Deep Image Compression

Deep Neural Networks trained as image auto-encoders have recently emerged as a promising direction for advancing the state of the art in image compression. The key challenge in learning such networks is twofold: to deal with quantization, and to control the trade-off between reconstruction error (distortion) and entropy (rate) of the latent image representation. In this paper, we focus on the l...

متن کامل

Regularized Auto-Encoders Estimate Local Statistics

What do auto-encoders learn about the underlying data generating distribution? Recent work suggests that some auto-encoder variants do a good job of capturing the local manifold structure of the unknown data generating density. This paper clarifies some of these previous intuitive observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstructi...

متن کامل

Learning invariant features through local space contraction

We present in this paper a novel approach for training deterministic auto-encoders. We show that by adding a well chosen penalty term to the classical reconstruction cost function, we can achieve results that equal or surpass those attained by other regularized auto-encoders as well as denoising auto-encoders on a range of datasets. This penalty term corresponds to the Frobenius norm of the Jac...

متن کامل

Stacked What-Where Auto-encoders

We present a novel architecture, the “stacked what-where auto-encoders” (SWWAE), which integrates discriminative and generative pathways and provides a unified approach to supervised, semi-supervised and unsupervised learning without relying on sampling during training. An instantiation of SWWAE uses a convolutional net (Convnet) (LeCun et al. (1998)) to encode the input, and employs a deconvol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1403.7752  شماره 

صفحات  -

تاریخ انتشار 2014